{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "pwywywUOqgOV"
   },
   "source": [
    "# LlamaIndex + DeepEval Integration\n",
    "\n",
    "This code tutorial shows how you can easily integrate LlamaIndex with DeepEval. DeepEval makes it easy to unit-test your LLMs.\n",
    "\n",
    "You can read more about the DeepEval framework here: https://docs.confident-ai.com/docs/framework\n",
    "\n",
    "Feel free to check out our repository here: https://github.com/confident-ai/deepeval\n",
    "\n",
    "![Framework](https://docs.confident-ai.com/assets/images/llm-evaluation-framework-example-b02144720026b6d49b1e04d8a99d3d33.png)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "ZfRYToRM4NxG"
   },
   "source": [
    "### Set-up and Installation\n",
    "\n",
    "We recommend setting up and installing via pip!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "3SLBSPwHqfJh"
   },
   "outputs": [],
   "source": [
    "!pip install -q -q llama-index\n",
    "!pip install -U -q deepeval"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "Ew1PitGb4V_G"
   },
   "source": [
    "This step is optional and only if you want a server-hosted dashboard! (Psst I think you should!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "-Px9Fr9GRczP"
   },
   "outputs": [],
   "source": [
    "!deepeval login"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "taqcdkaYYwTs"
   },
   "source": [
    "### Testing for factual consistency"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "usj5Dp3Mhztn"
   },
   "outputs": [],
   "source": [
    "from llama_index.response.schema import Response\n",
    "from typing import List\n",
    "from llama_index.schema import Document\n",
    "from deepeval.metrics.factual_consistency import FactualConsistencyMetric"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "hL0J32q7kI6X"
   },
   "source": [
    "## Setting Up The Evaluator\n",
    "\n",
    "Setting up the evaluator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9vRW0D_vkdW7"
   },
   "outputs": [],
   "source": [
    "from llama_index import (\n",
    "    TreeIndex,\n",
    "    VectorStoreIndex,\n",
    "    SimpleDirectoryReader,\n",
    "    LLMPredictor,\n",
    "    ServiceContext,\n",
    "    Response,\n",
    ")\n",
    "from llama_index.llms import OpenAI\n",
    "from llama_index.evaluation import FaithfulnessEvaluator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "nEpJv40lkhhx"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "api_key = \"sk-XXX\"\n",
    "openai.api_key = api_key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "jm6qakfZahMd"
   },
   "outputs": [],
   "source": [
    "gpt4 = OpenAI(temperature=0, model=\"gpt-4\", api_key=api_key)\n",
    "service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)\n",
    "evaluator_gpt4 = FaithfulnessEvaluator(service_context=service_context_gpt4)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "1rKB7Hk0lCtP"
   },
   "source": [
    "### Getting a LlamaHub Loader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "nHnMc3QhlBV8"
   },
   "outputs": [],
   "source": [
    "from llama_index import download_loader\n",
    "\n",
    "WikipediaReader = download_loader(\"WikipediaReader\")\n",
    "\n",
    "loader = WikipediaReader()\n",
    "documents = loader.load_data(pages=[\"Tokyo\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "fChhgGLJkViR"
   },
   "outputs": [],
   "source": [
    "tree_index = TreeIndex.from_documents(documents=documents)\n",
    "vector_index = VectorStoreIndex.from_documents(\n",
    "    documents, service_context=service_context_gpt4\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "nTFoA4RP2_BJ"
   },
   "source": [
    "We then build an evaluator based on the `BaseEvaluator` class that requires an `evaluate` method.\n",
    "\n",
    "In this example, we show you how to write a factual consistency check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "HXHou8Yw3guL"
   },
   "outputs": [],
   "source": [
    "from typing import Any, Optional, Sequence\n",
    "from llama_index.evaluation.base import BaseEvaluator, EvaluationResult\n",
    "\n",
    "\n",
    "class FactualConsistencyEvaluator(BaseEvaluator):\n",
    "    def evaluate(\n",
    "        self,\n",
    "        query: Optional[str] = None,\n",
    "        contexts: Optional[Sequence[str]] = None,\n",
    "        response: Optional[str] = None,\n",
    "        **kwargs: Any,\n",
    "    ) -> EvaluationResult:\n",
    "        \"\"\"Evaluate factual consistency metrics\"\"\"\n",
    "        if response is None or contexts is None:\n",
    "            raise ValueError('Please provide \"response\" and \"contexts\".')\n",
    "        metric = FactualConsistencyMetric()\n",
    "        context = \" \".join([d for d in contexts])\n",
    "        score = metric.measure(output=response, context=context)\n",
    "        return EvaluationResult(\n",
    "            response=response,\n",
    "            contexts=contexts,\n",
    "            passing=metric.is_successful(),\n",
    "            score=score,\n",
    "        )\n",
    "\n",
    "\n",
    "evaluator = FactualConsistencyEvaluator()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "background_save": true
    },
    "id": "rWOduygYYuPe",
    "outputId": "2140b979-d9ab-407d-83f7-4e875191de02"
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:470: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.\n",
      "  warnings.warn(\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'success': True, 'score': 0.97732705}\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.10/dist-packages/deepeval/metrics/metric.py:42: UserWarning: API key is not set. Please set it by visiting https://app.confident-ai.com\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "query_engine = tree_index.as_query_engine()\n",
    "response = query_engine.query(\"How did Tokyo get its name?\")\n",
    "eval_result = evaluator.evaluate_response(response=response)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "id": "z3-ZPq3a35pI"
   },
   "source": [
    "## Other Metrics\n",
    "\n",
    "We recommend using other metrics to help give more confidence to various prompt iterations, LLM outputs etc. We think ML-assisted approaches are required to give performance for these models.\n",
    "\n",
    "- Overall Score: https://docs.confident-ai.com/docs/measuring_llm_performance/overall_score\n",
    "- Answer Relevancy: https://docs.confident-ai.com/docs/measuring_llm_performance/answer_relevancy\n",
    "- Bias: https://docs.confident-ai.com/docs/measuring_llm_performance/debias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
